Skip to content

feat(utility): constexpr string-parse primitives (Phase A of #835)#838

Merged
Ravenwater merged 3 commits into
mainfrom
feat/issue-835-string-parse-foundation
May 16, 2026
Merged

feat(utility): constexpr string-parse primitives (Phase A of #835)#838
Ravenwater merged 3 commits into
mainfrom
feat/issue-835-string-parse-foundation

Conversation

@Ravenwater

@Ravenwater Ravenwater commented May 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Why this needs a foundation first

A pre-implementation survey (see issue #835 for the roadmap) found that today's parse() implementations are reimplemented per type, with divergent format coverage:

  • lns refuses decimal value strings while fixpnt accepts them
  • posit accepts a custom nbits.esXhexvalue form not found anywhere else
  • dfloat accepts decimal but not binary bit-patterns; lns is the opposite

Without a shared scanner, every type that adopts string parsing will continue this divergence. Phase A fixes that.

What's in the header

sw::universal::string_parse namespace:

Primitive Returns Notes
scan_prefix(string_view) {number_base, body} 0b/0o/0x detection (case-insensitive); no prefix -> decimal
scan_sign(string_view) {negative, rest} +/- detection; no sign -> positive
parse_binary/octal/hex(string_view) {value, digits, overflow, valid} MSB-first uint64_t accumulation with overflow detection
scan_decimal_float(string_view) {valid, negative, int_part, frac_part, exp10} [-+]?int.frac[eE][-+]?exp tokenized to string_views over input

Plus character classifiers (is_decimal_digit, is_hex_digit, ...) and a hex_digit_value helper.

Scope (what's explicitly out)

Phase A does NOT perform value reconstruction (turning digit strings into a number type's bit representation). That step depends on the target type's precision and rounding rules and so belongs in each type's own parse() in Phase B and later.

scan_decimal_float returns string_views pointing into the input, plus a signed int32_t exponent -- so the caller can iterate the digits and accumulate into whatever representation it wants (uint64_t, multi-limb blockbinary, decimal accumulator, ...).

Tests

static/utility/test_string_parse.cpp -- 57 test assertions + 8 static_assert smoke tests proving constexpr.

  • scan_prefix: 0b/0B/0o/0O/0x/0X case variants, decimal default, leading-0-without-prefix, empty input
  • scan_sign: '+', '-', no sign, empty, double-sign (first only consumed)
  • parse_binary/octal/hex: valid input, partial-stop-at-invalid-char, 64-bit boundary (no overflow), 65/17-digit overflow detection, empty -> invalid
  • scan_decimal_float: integer-only, fraction-only, integer+fraction, positive/negative exponent, mixed signs, malformed (bare dot, trailing garbage, "1e" no exp digits), 50-digit fraction (constexpr-friendly: views point into input, no allocation)

Test results

Target gcc build gcc test clang build clang test
utility_test_string_parse OK 57/57 PASS OK 57/57 PASS

Phased roadmap (for context; only Phase A is in this PR)

Phase Scope Size
A. Foundation (this PR) Shared constexpr string-parse utility + tests M
B. Conforming uplift Migrate posit/cfloat/integer/fixpnt onto this M-L
C. Partial uplift lns/dbns/areal/dfloat/hfloat/dfixpnt/edecimal/unum1 L
D. Greenfield dd/qd/bfloat16/valid/microfloats/elastics/block formats XL
E. Cross-cutting tests Format x type coverage matrix M

Test plan

  • Fast CI passes (gcc + clang CI_LITE)
  • CodeRabbit feedback addressed
  • Promote to ready when satisfied: `gh pr ready `

Relates to #835

Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added allocation-free, constexpr-capable parsing primitives: sign and base-prefix detection, digit classifiers, MSB-first binary/octal/hex integer parsing with digit count and overflow reporting, and a decimal floating-point tokenizer with optional fractional and exponent parts.
  • Tests

    • Added a standalone test suite with compile-time assertions and runtime tests covering valid, invalid, edge, and overflow cases.

Review Change Stack

Adds include/sw/universal/utility/string_parse.hpp -- a shared foundation
of constexpr-friendly primitives that the number systems will adopt in
later phases of issue #835.

Why
---
Today's parse() implementations are reimplemented per type (posit,
cfloat, integer, fixpnt, lns, ...), each with its own scanner. Format
coverage has diverged (lns refuses decimal value strings while fixpnt
accepts them; posit accepts a custom "nbits.esXhex" form; etc.). Without
a shared foundation, every type that adopts string parsing will continue
this divergence.

This file is the foundation. All primitives operate on std::string_view
and are constexpr -- they can be called from constexpr ctors once the
number systems are migrated in Phase B.

What's in the header
--------------------
sw::universal::string_parse namespace:

  enum number_base { binary, octal, decimal, hex, unknown };

  prefix_scan         scan_prefix(string_view);      // 0b/0o/0x -> base + body
  sign_scan           scan_sign(string_view);        // +/- detection
  bit_pattern_result  parse_binary/parse_octal/parse_hex(string_view);
                                                     // MSB-first uint64_t with overflow
  decimal_float_scan  scan_decimal_float(string_view);
                                                     // [-+]?int.frac[eE][-+]?exp
                                                     // returns string_views into input
                                                     // plus signed int32 exponent

Plus character classifiers (is_decimal_digit, is_hex_digit, ...) and a
hex_digit_value helper.

Phase A explicitly does NOT perform value reconstruction (turning
digit strings into a number type's bit representation). That step
depends on the target type's precision and rounding rules and so
belongs in each type's own parse() in Phase B.

Tests
-----
static/utility/test_string_parse.cpp covers all primitives with 57 test
assertions plus 8 static_assert smoke tests proving constexpr.

  - scan_prefix: 0b/0B/0o/0O/0x/0X case variants, decimal default,
    leading-0-without-prefix, empty input
  - scan_sign: '+', '-', no sign, empty, double-sign (first only)
  - parse_binary/octal/hex: valid input, partial-stop-at-invalid-char,
    64-bit boundary (no overflow), 65/17-digit overflow detection,
    empty -> invalid
  - scan_decimal_float: integer-only, fraction-only, integer+fraction,
    positive/negative exponent, mixed signs, malformed (bare dot,
    trailing garbage, "1e" no exp digits), 50-digit fraction

Passes 57/57 on gcc and clang.

This commit closes nothing in #835 by itself. Phase B (posit/cfloat/
integer/fixpnt uplift onto this foundation) will reference the same
issue and start consuming these primitives.

Relates to #835

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c469402b-30b2-47a7-b62c-58b6db36e10c

📥 Commits

Reviewing files that changed from the base of the PR and between 3130088 and e96fa05.

📒 Files selected for processing (2)
  • include/sw/universal/utility/string_parse.hpp
  • static/utility/test_string_parse.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • static/utility/test_string_parse.cpp
  • include/sw/universal/utility/string_parse.hpp

📝 Walkthrough

Walkthrough

Adds a new header string_parse.hpp implementing constexpr ASCII-only scanning primitives (base/sign detection, digit helpers, binary/octal/hex accumulation, decimal-float tokenizer), a standalone test executable validating compile-time and runtime behavior, and a small CI workflow scope update allowing utility.

Changes

String Parsing Primitives

Layer / File(s) Summary
String Parsing Primitives Library
include/sw/universal/utility/string_parse.hpp
Defines number_base enum and result structs (prefix_scan, sign_scan, bit_pattern_result, decimal_float_scan). Provides constexpr functions: scan_prefix() and scan_sign(); digit helpers (is_binary_digit, is_octal_digit, is_decimal_digit, is_hex_digit, hex_digit_value); base‑N parsers (parse_binary, parse_octal, parse_hex) that MSB‑first accumulate into std::uint64_t with digit count and overflow/valid flags; and scan_decimal_float() tokenizing decimal literals with optional sign, integer part, fractional part, and exponent.
Parsing Primitives Test Suite
static/utility/test_string_parse.cpp
Minimal test harness with global counters and report() helper. Adds static_assert compile‑time smoke tests for primitives and runtime test functions covering valid, invalid, edge cases (case variants, empty input, overflow boundaries, trailing garbage). Implements main() runner that reports results and returns failure on any test failure.

CI Conventional Commits

Layer / File(s) Summary
Conventional commits scope update
.github/workflows/conventional-commits.yml
Adds utility to the allowed scopes list for the semantic PR title validation action.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

enhancement

🐰 Whiskers twitch with parsing glee,
Constexpr strings dance, fast and free!
No allocations, no fuss—
Just scanning that builds trust in us.
Binary, octal, decimal too,
A parsing feast, both old and new!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding constexpr string-parsing primitives to the utility module, referencing the relevant issue phase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/issue-835-string-parse-foundation

Comment @coderabbitai help to get the list of available commands and usage tips.

@Ravenwater Ravenwater self-assigned this May 16, 2026
@Ravenwater Ravenwater moved this to In progress in Universal Number Library May 16, 2026
@Ravenwater Ravenwater added this to the V4 milestone May 16, 2026
The PR title for #838 uses feat(utility): which the conventional-commits
check rejected because "utility" wasn't in the scope list. The library
has an actual include/sw/universal/utility/ directory for shared
infrastructure (bit_cast, find_msb, convert_to, ...), so the scope is
legitimate and will recur for any future utility additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@static/utility/test_string_parse.cpp`:
- Around line 254-334: Add tests in test_scan_decimal_float to cover int32
boundary exponents because scan_decimal_float delegates exponent parsing to
detail::parse_int32; specifically, add cases asserting that "1e2147483647"
yields valid with exp10 == INT32_MAX, "1e-2147483648" yields valid with exp10 ==
INT32_MIN, and that "1e2147483648" is rejected (invalid) to verify overflow
handling in scan_decimal_float/detail::parse_int32.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2246d01e-9a76-4a77-bc75-6f967ceedf31

📥 Commits

Reviewing files that changed from the base of the PR and between 8253109 and eae930d.

📒 Files selected for processing (2)
  • include/sw/universal/utility/string_parse.hpp
  • static/utility/test_string_parse.cpp

Comment thread static/utility/test_string_parse.cpp
Addresses the round-1 CodeRabbit review of #838: add tests covering the
int32 exponent boundary in scan_decimal_float.

Writing the requested tests surfaced a real bug: parse_int32 used a
single bound v > 2147483647 to reject overflow, which (correctly) rejects
positive values above INT32_MAX but (incorrectly) also rejects the
well-formed representation of INT32_MIN (where |INT32_MIN| = 2^31 =
2147483648). For "1e-2147483648" the accumulator reaches 2147483648
before the sign is applied, so the check fires too early.

Fix: track sign first and pick the bound conditionally
    positive: at most INT32_MAX     = 2147483647
    negative: at most |INT32_MIN|   = 2147483648
The int64 -> int32 cast at the end yields INT32_MIN for v = 2147483648
under C++20's mandated two's-complement.

Added tests:
- "1e2147483647"   -> valid, exp10 == INT32_MAX
- "1e-2147483648"  -> valid, exp10 == INT32_MIN
- "1e2147483648"   -> invalid (overflow on positive side)

60/60 tests pass on gcc and clang.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Ravenwater Ravenwater marked this pull request as ready for review May 16, 2026 10:40
@Ravenwater Ravenwater merged commit 06d237b into main May 16, 2026
36 of 37 checks passed
@Ravenwater Ravenwater deleted the feat/issue-835-string-parse-foundation branch May 16, 2026 10:54
@github-project-automation github-project-automation Bot moved this from In progress to Done in Universal Number Library May 16, 2026
@coveralls

Copy link
Copy Markdown

Coverage Report for CI Build 25959866834

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage increased (+0.007%) to 83.953%

Details

  • Coverage increased (+0.007%) from the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • 8 coverage regressions across 1 file.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

8 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
include/sw/universal/number/cfloat/cfloat_impl.hpp 8 93.48%

Coverage Stats

Coverage Status
Relevant Lines: 54832
Covered Lines: 46033
Line Coverage: 83.95%
Coverage Strength: 6383955.66 hits per line

💛 - Coveralls

Ravenwater added a commit that referenced this pull request May 16, 2026
…hase B1 of #835) (#839)

* feat(integer, fixpnt): migrate/implement parse() onto string_parse (Phase B1 of #835)

Phase B1 of issue #835: bring `integer` and `fixpnt` onto the shared
constexpr string-parsing foundation that landed in Phase A (#838).

integer
-------
Replaces the std::regex + std::map + reverse-iteration parser body in
parse(const std::string&, integer&) with a clean MSB-first scan that
delegates prefix/sign detection and character classification to the
`sw::universal::string_parse` primitives.

Notable functional changes:
  - The previous octal branch was a `// TODO` that always returned false
    after detecting C-style leading-zero octal. Replaced with a real
    implementation gated on the explicit `0o` / `0O` prefix.
  - Binary (`0b` / `0B`) is newly supported.
  - Hex still accepts apostrophe as a digit separator (e.g. `0xDE'AD').
  - Decimal accepts a single optional leading `+` or `-`.
  - Drops the `<regex>` and `<map>` includes from integer_impl.hpp.

fixpnt
------
`fixpnt::parse()` was previously forward-declared (fixpnt_fwd.hpp:22) and
called by `operator>>` (fixpnt_impl.hpp:2073) but had no definition --
a latent link error that nobody had hit because no test exercises stream
input on fixpnt. This PR provides the definition.

Accepted syntax:
  [+-]? ( 0[bB][01]+ | 0[oO][0-7]+ | 0[xX][0-9A-F']+ | [0-9]+ )

Bit-pattern parsing (binary / octal / hex) fills the underlying storage
MSB-first via setbit(0) + left shift -- matches the convention of
setbits(). Decimal parsing is integer-only in this phase: the digit
string is accumulated as an integer K and stored as `K << rbits`, so
parse("5") on fixpnt<8,4> yields the value 5.0. Decimal-fraction
parsing ("3.14") is deferred to Phase B2 along with the analogous
float-from-string work for posit and cfloat.

Tests
-----
- static/integer/binary/api/string_parse.cpp -- 33 assertions covering
  decimal, binary, octal, hex with sign handling and rejection of
  malformed inputs across integer<8,16,32,64> widths.
- static/fixpnt/binary/api/string_parse.cpp -- 16 assertions covering
  the same format matrix on fixpnt<8,4>, <9,4>, <16,8> with explicit
  bit-pattern verification via setbits-comparison.

Both test targets pass 33/33 and 16/16 on gcc and clang. Existing
integer api regression (`bint_api`) unchanged.

Relates to #835

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(integer, fixpnt): address CodeRabbit review on PR #839

Four items from CodeRabbit's review of the Phase B1 PR:

1. **Actionable: fixpnt decimal branch UB + policy bypass** -- the previous
   implementation used `setbits(static_cast<uint64_t>(10) << rbits)` to
   construct 10.0 and the per-digit operand. This both (a) bypassed the
   Saturate / Modulo arithmetic policy that *= and += would otherwise
   enforce, and (b) invoked UB for any fixpnt instantiation with
   rbits >= 64. Replaced with FP(10) and FP(d) using the converting
   constructor from native int. Removed the dead one_ulp placeholder.

2. **Actionable: fixpnt hex branch silent-success on only-separator input** --
   a payload of just apostrophes (e.g. "0x'", "0x''") would skip every
   iteration via the separator continue and exit the loop with no real
   digit processed but no error, leaving value zero. Same gap existed in
   integer's parse() (and in binary / octal branches for any future
   conventions that introduce separators). Added a digit_found flag in
   every bit-pattern branch in both files; return false if no real digit
   was seen. Decimal keeps the flag for symmetry though is_decimal_digit
   already gates the loop body.

3. **Actionable: fixpnt operator>> failure semantics** -- the istream
   operator merely logged to std::cerr on parse failure, leaving the
   stream in a "successful" state so callers (e.g. `while (in >> x)`)
   couldn't detect the error. Now sets failbit via istr.setstate.

4. **Nitpick: integer test type mismatch** -- check_parse<16, std::uint8_t>
   was invoked with I8(15) / I8(64) as the expected value (relied on
   implicit narrowing via the converting constructor). Added an I16
   alias and used it for the 16-bit cases.

Test additions exercise the new rejection paths:
  - "0x'", "0x''", "0x'''"  -> integer and fixpnt both reject
Plus the type-corrected octal tests on I16.

Final counts: integer 36/36, fixpnt 18/18. Passes on gcc and clang.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(fixpnt): keep cerr diagnostic alongside the failbit on operator>> parse error

Round-2 (70d0ce6) replaced the cerr log with setstate(failbit) per CodeRabbit's
review of #839. CodeRabbit's prompt was explicit that the cerr message was
optional, but on reflection both belong: failbit is the programmatic signal
(while (in >> x) loops, etc.) and the cerr message helps interactive
debugging. posit's operator>> uses the same dual-signal pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(fixpnt): rename FP alias to Fixpnt for unambiguous fixed-point semantics

Per maintainer review on PR #839: the alias `FP` reads as "floating-point" to
most numerics readers, which is exactly the wrong association inside a
fixed-point parser. Renamed to `Fixpnt` (matching the class name) in both
sites:

  - include/sw/universal/number/fixpnt/fixpnt_impl.hpp parse() body
    (the `using FP = fixpnt<...>` alias and its two consumer references)
  - static/fixpnt/binary/api/string_parse.cpp test helper aliases
    (FP8_4 -> Fixpnt8_4, FP16_8 -> Fixpnt16_8)

Pure rename. No behavior change. Tests still 18/18 on gcc and clang.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(integer): symmetric failbit-on-parse-error in operator>>

Mirrors the fix applied to fixpnt::operator>> in this PR. integer<>'s
istream extractor previously only logged to std::cerr on parse failure,
leaving the stream in a "successful" state -- a `while (in >> x)` loop
would not terminate on bad input. Now sets failbit on failure
(programmatic detection signal) while keeping the cerr message
(interactive-debugging signal). Same dual-signal pattern as fixpnt.

Drive-by: the original cerr message read "into a posit value" -- a
copy-paste leftover from posit's identical extractor. Corrected to
"into an integer value".

bint_string_parse still 36/36 and bint_api passes; no behavior change
for the success path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(integer, fixpnt): commit parse result only on success

CodeRabbit's round-2 review (PR #839) flagged that the parsers cleared
the output value at entry and then mutated it in place, so any mid-parse
`return false` left the caller's object partially updated (e.g., zero
followed by a few bits, rather than the original state). Standard practice
for parse() is "leave output untouched on failure."

Refactored both integer and fixpnt parse() to build into a local temporary
(`Int tmp` / `Fixpnt tmp`) and only assign `value = tmp` at the very end
on a fully successful parse. Every existing `return false` path now exits
without touching the caller's `value`.

Two new regression assertions per file verify the contract:
  - Hex-with-invalid-trailing-char ("0x1G") on a pre-set value leaves it
    at the previous value.
  - Decimal-then-garbage ("123abc" / "12abc") same.

Both apply to integer and fixpnt for symmetry, even though CR only called
out fixpnt explicitly (the integer parser had the same pattern).

Final counts: integer 38/38, fixpnt 20/20 on gcc and clang.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(integer, fixpnt): adopt canonical ReportTestSuite pattern for parse() tests

Replaces the homegrown g_total/g_failures bench from the original Phase B1
tests with the convention used across hundreds of existing tests in
static/*/api/*: ReportTestSuiteHeader, scope-block test groups with
"int start = nrOfFailedTestCases" + per-assertion ++nrOfFailedTestCases,
trailing "if (nrOfFailedTestCases - start > 0) std::cout << \"FAIL: ...\""
diagnostic, and ReportTestSuiteResults at the end.

Also adds the standard catch-block footer (ad-hoc / arithmetic /
internal / runtime / ...).

No behavior change in what is tested -- same matrix of decimal / binary /
octal / hex / invalid-rejection / commit-on-success cases. Just the
reporter idiom now matches the codebase.

Output now reads:
  integer<> parse() string-parsing test suite: PASS
  fixpnt<>  parse() string-parsing test suite: PASS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ravenwater added a commit that referenced this pull request May 16, 2026
…se foundation test (#840)

Follow-on cleanup for PR #838 (Phase A of #835). The foundation test
landed with a homegrown g_total/g_failures bench instead of the
ReportTestSuiteHeader / scope-block / ReportTestSuiteResults idiom used
by the rest of the Universal regression suite (and applied to the
integer/fixpnt parse() tests on PR #839).

Refactor mirrors that style:
  - ReportTestSuiteHeader at entry, ReportTestSuiteResults at exit
  - Each primitive (scan_prefix / scan_sign / parse_binary/octal/hex /
    scan_decimal_float) is one scope block: `int start = nrOfFailedTestCases`
    at the top, per-assertion `if (...) ++nrOfFailedTestCases`,
    trailing `if (nrOfFailedTestCases - start > 0) std::cout << "FAIL: ..."`
    diagnostic.
  - Standard catch footer (ad-hoc / runtime / ...)

Static_assert smoke tests at file scope (proving constexpr) are kept
verbatim -- they verify a different invariant (compile-time evaluation)
that the runtime block can't.

Same test matrix, same coverage. Output now reads:
  string_parse primitives (Phase A of #835) test suite: PASS

Relates to #835

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ravenwater added a commit that referenced this pull request May 18, 2026
…#865)

* feat(edecimal): extend parse to decimal-point and scientific notation

Previously edecimal::parse only accepted integer literals matching
[+-]*[0-9]+ via std::regex_match.  Anything with a decimal point or
an exponent suffix was rejected, even when the value was exactly
representable as an integer ("3.14e2" = 314).

Switch the tokenizer to sw::universal::string_parse::scan_decimal_float
(the foundation from #838) which yields int_part, frac_part, and a
signed exp10.  The effective decimal exponent is exp10 - frac.size():
when non-negative, the value is an exact integer and we accept it; when
negative the value has fractional digits that edecimal cannot represent
without precision loss, so we reject ("3.14", "1.5e-100", "0.001").
This matches the issue spec's "preserve those digits exactly" rule.

Accepted forms now include:
  "42", "-1000"              -- integer (unchanged)
  "3.14e2",   "-2.5e1"       -- decimal point with shift to integer
  "1e10",     "1.5e10"       -- pure exponent or compatible
  "3.14e+200"                -- 201-digit exact integer
  "5.", ".5e1"               -- edge syntax that scan_decimal_float allows

Side effects:
- Call unpad() after parsing so "0042" / "0.0042e4" no longer carry
  leading-zero limbs.
- Collapse "-0" / "-0.0e5" to +0 (no negative zero).

operator>> hygiene (failbit + extraction guard) was already shipped in
#858 (Phase E of #835); the test file now also pins it.

Test (elastic/decimal/conversion/string_parse.cpp) extended to 9 groups:
- integer parse (canonical + large)
- scientific exact (with the 201-digit "3.14e+200" reference)
- decimal-point form that produces an integer
- fractional input rejected (3.14, -0.5, 0.001, 1.5e-100, 3.0, 10.50e0)
- malformed reject (empty, alpha, "1e", ".", "1.2.3", "1e3.5", "42x", "0x1F")
- negative-zero collapse
- operator>> failbit on bad token
- operator>> success on scientific token in whitespace
- operator>> empty stream

Resolves #854

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(edecimal): cap parse() expansion to prevent unbounded allocation

CodeRabbit round 1 on #865 flagged a DoS surface: scan_decimal_float
returns an int32 exponent, so an input like "1e2000000000" would loop
push_back ~2 billion trailing zeros inside parse, allocating ~2 GiB.

Cap the post-expansion digit count at 1 MiB (1,048,576 digits) and use
vector::reserve + vector::insert in place of repeated push_back so the
accepted path also stays O(N) without amortized growth.

Test additions:
- "1e2000000000", "1e2147483647" (INT32_MAX), "1e10000000",
  "1e1048576" (cap+1) -- all rejected.
- The cap is exclusive: 1e1048575 (1 MiB significand) still parses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ravenwater added a commit that referenced this pull request May 18, 2026
#866)

Previously erational::parse only accepted integer literals matching
[+-]*[0-9]+ and left the denominator unset (relying on the prior
state, which defaulted to 1 only on a freshly constructed value).

Replace parse() with a routing function and a static helper
parse_decimal_to_fraction(s, num, den, neg) that uses scan_decimal_float
(the foundation from #838) to tokenize either an integer, a decimal,
or a scientific literal into an exact (numerator, denominator)
edecimal pair:

  "42"          -> num=42,  den=1
  "3.14"        -> num=314, den=100
  "1.5e2"       -> num=150, den=1
  "1.5e-1"      -> num=15,  den=10

For the p/q form, split on '/' and parse each half through the same
helper, then combine as (p_num * q_den) / (p_den * q_num).  Sign is
the XOR of the two sides.  Mixed forms like "3.14/2" and "1e2/2e1"
work because each side is independently a decimal/scientific literal.

  "1/2"        -> 1/2
  "-22/7"      -> -22/7
  "22/-7"      -> -22/7
  "4/8"        -> 1/2     (normalize() reduces via GCD)
  "3.14/2"     -> 157/100

Rejected forms:
  - q == 0 across all flavors: "1/0", "5/0.0", "0/0", "1/0e10"
    (erational has no NaR encoding, and silently representing
     infinity would mask downstream divide-by-zero detection)
  - Two slashes: "1/2/3"
  - Empty side: "1/", "/2", "/"
  - Malformed decimal: "3.14.15", "1e", ".", "42x"

Defensive cap: parse_decimal_to_fraction rejects any input whose
significand or denominator would exceed 2^20 (1,048,576) digits, the
same cap used by edecimal::parse (#854).

operator>> hygiene (failbit + extraction guard) was already shipped in
#858 (Phase E of #835); the test file now also pins it.

Test (elastic/rational/decimal/conversion/string_parse.cpp) extended
to 11 groups: integer, p/q with simplification, decimal-to-rational,
scientific, mixed p/q with decimal sides, q=0 rejection, malformed,
negative-zero collapse, operator>> failbit on bad, operator>> on a
p/q token in whitespace, operator>> empty stream.

Resolves #855

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants